Goto

Collaborating Authors

 operation research










Bandits

Neural Information Processing Systems

Foreacharma, letr(a) and cj(a) be, resp., the meanrewardandmeanresource-j consumption,i.e.,(r(a);c1(a),..., cd(a)):=Eo Da[o].We sometimeswriter =( r(a): a 2 [K])andcj =( cj(a): a 2 [K])asvectorsoverarms. Second, weuseatighterversionof Eq. (3.6) (see AppendixD.3):